Automatic information extraction from the Web

نویسندگان

  • David Sánchez
  • Antonio Moreno
چکیده

The Web is a valuable repository of information. However, its size and its lack of structure difficult the search and extraction of knowledge. In this paper, we propose an automatic and autonomous methodology to retrieve and represent information from the Web in a standard way for a desired domain. It is based on the intensive use of a publicly available search engine and the analysis of a large quantity of web resources for detecting the information relevance. Polysemy detection and aliases/synonyms discovering of the evaluated domain are also considered. Results can be very useful for easing the access of the user to the Web resources or allowing computer processing of the data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Automatic Lane Extraction in Hemoglobin and Serum Protein Electrophoresis Using Image Processing

Image analysis is an image processing technique that aims to extract features or information from images. Image analysis in medicine has a special place because is a basis for disease diagnosis for physicians. Electrophoresis is a laboratory separating technique. Electrophoresis images are created during the electrophoresis process. Serum protein and hemoglobin electrophoresis test are the ...

متن کامل

Automatic Lane Extraction in Hemoglobin and Serum Protein Electrophoresis Using Image Processing

Image analysis is an image processing technique that aims to extract features or information from images. Image analysis in medicine has a special place because is a basis for disease diagnosis for physicians. Electrophoresis is a laboratory separating technique. Electrophoresis images are created during the electrophoresis process. Serum protein and hemoglobin electrophoresis test are the ...

متن کامل

Object-Oriented Method for Automatic Extraction of Road from High Resolution Satellite Images

As the information carried in a high spatial resolution image is not represented by single pixels but by meaningful image objects, which include the association of multiple pixels and their mutual relations, the object based method has become one of the most commonly used strategies for the processing of high resolution imagery. This processing comprises two fundamental and critical steps towar...

متن کامل

Data Extraction using Content-Based Handles

In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...

متن کامل

Automatic Extraction of Information from the Web

The semantic Web will bring meaning to the Internet, making it possible for web agents to understand the information it contains. However, current trends seem to suggest that the semantic web is not likely to be adopted in the forthcoming years. In this sense, meaningful information extraction from the web becomes a handicap for web agents. In this article, we present a framework for automatic ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004